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ABSTRACT 


This  paper  describes  s  regression  estiaator  which  builds  on  the 
investigation  of  the  covariance  structure  in  the  full  space  of 
explanatory  variables  and  response  variable.  It  is  robust  since  it 
down  weights  outlying  observations.  A  coaparison  to  other  robust 
estiaators  is  included. 
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1.  INTRODUCTION 

la  recent  years  a  variety  of  estimators  for  regression 
parameters  have  been  proposed.  The  driving  force  behind  this  kind 
of  research  is  the  non-robastness  of  the  ordinary  least  squares 
(OLS)  estimator.  Just  as  in  the  case  of  location  estimation  the  OLS 
solution  is  unduly  influenced  by  departures  from  the  "usual”  model 
assumptions.  Two  different  approaches  to  robustif ication  have  been 
proposed.  The  first  one  is  due  to  Huber  (1973)  (see  also  Mosteller 
and  Tukey  (1977))  and  works  by  "Huberizing"  residuals.  The  second 
one  is  based  on  Hampel's  work  (1974)  and  due  to  Crasker  and  Welseh 
(1982).  These  methods  bound  the  "influence"  on  the  estimated 
parameter  vector. 

In  this  paper  we  will  describe  an  approach  to  the  regression 
problem  which  is  based  on  covariance  estimation.  Instead  of 
attacking  the  problem  of  estimating  the  regression  parameters 
directly  as  solutions  to  some  "normal  equation",  we  will  view  the 
problem  in  a  larger  context  and  try  to  summarize  what  we  learn  from 
the  data  in  an  estimated  covariance  matrix  which  contains  all  the 
information  we  need.  This  approach  has  the  advantage  that  the 
response  variable  and  the  explanatory  variables  are  treated 
symmetrically.  We  can  therefore  compute  any  of  the  possible 
regressions  —  i.e.  any  choice  of  response  —  in  a  single  run 
without  having  to  recalculate  weights. 


Ia  Section  2  we  will  give  the  formulas  end  disease  one 
particular  estimator  based  on  an  affine  equivariant  eovariance 
estimator.  Section  3  contains  a  (liaited)  comparison  with  other 
robust  regression  estimators  through  asymptotics  and  experimental 
sampling. 

2.  ROBUST  REGRESSION  THROUGH  ROBUST  COVARIANCES 
2.1  DEFINITION  OF  THE  ESTIMATOR 

Let  Zj'  “  i*l,...,n  be  n  vectors  in  which  are 

created  by  n  independent  observations  from  the  stochastic  model 
T  *  0'X  +  E  (2.1) 

with  fixed,  unknown  regression  parameter  0  e  Ip,  random  carrier 
Xs  Rp  and  random  error  E  independent  of  the  carrier. 

In  order  to  describe  the  probability  structure  in  (2.1)  two 
distributions  are  needed.  First  the  error  distribution  H(e/«)  and 
second  the  carrier  distribution  G().  Together  with  the  independence 
assumption  in  (2.1)  these  two  determine  the  distribution  of  Z'  * 
(X',Y).  As  we  have  indicated  there  is  at  least  one  more  parameter 
of  primary  importance,  namely  the  'scale*  a  in  oar  error 
distribution.  All  of  the  bounded  influence  research  is  taking  place 
in  a  model  like  (2.1). 

The  classical  solution  to  the  inference  problems  posed  by 
(2.1)  is  based  on  the  principle  of  least  squares.  In  recent  years 
several  alternative  robust  estimators  b  of  0  have  been  proposed. 


They  all  can  be  conveniently  described  by  "normal  equations* : 


1ft 

j,  *i  ri  ■  "•  *i  •  -  k'«i 


a  r . 

[2J  2  x  p( — -)  *  0  Haber  (1973). 

i-1  s 


Here  s  is  an  auxiliary  or  simultaneous  estimate  of  the  error 
•scale"  and  j*()  is  a  somewhat  arbitrary  function  which  is  usually 
chosen  in  such  a  way  that  the  efficiency  at  a  central  model  like 
HO  *  i()  is  high. 

ri 

[3]  2  mi  f( — -)  »  0  Mallows  (1975). 

i*l  s 


Here  Wj  is  a  weight  which  depends  on  Xj  (and  possibly  all  the  other 


Xj 's). 


i 

(4]  2  wA  xA  f( - )  *  0  Schweppe  (1975). 

i“l  sw . 


This  form  is  optimal  with  respect  to  the  heuristic  notion  of 
bounded  influence  (Hampel  (1974)  and  Crasher  and  Welsch  (1982)).  A 
review  of  these  estimators  can  be  found  in  Maronna.  Bustos  and 
Yohai  (1979). 


Ill  gives  the  ordinary  least  squares  estimator  (OLS).  12) 
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corresponds  to  the  classical  approach  to  robast  regression  and 
there  are  two  popular  choices  for  the  p-function: 

(i)  f(x)  *  Vfc(x)  m  max(-k,min(k,x) ) 

(ii)  f(x)  -  bsqk(x)  -  x/k  (1  -  (x/k)2)2  if  -k<|x|<k 
*  0  otherwise 

The  second  one  is  preferable  since  it  gives  excellent  protection 
against  heavy-tailed  error  distributions.  [3]  and  [4]  are  bounded 
influence  estimators  (see  Hampel  (1974)).  Asymptotic  optimality 
theory  tells  us  that  estimators  of  the  form  [4]  are  preferable. 
Dividing  and  multiplying  each  term  in  the  sum  by  ri/(sw^)  we  get 

r . 

n  r  sw, 

14']  2  x.  r .  - —  -  0 

sw. 

1 


which  we  immediately  recognize  as  the  normal  equation  of  a  weighted 
least  squares  problem.  The  optimal  weights  are  of  the  form 


where  is  as  in  (i)  above  and  w^  »  (x^’S'^Xj)-1^2  (S  an 

estimated  covariance  matrix  of  the  carriers). 

The  theory  behind  this  statement  can  be  found  in  Erasker  and  Welsch 


(1982). 


Ye  have  started  this  section  by  introducing  the  vectors  z^  a 
Rp+*  which  we  get  by  joining  the  carriers  x ^  to  the  observations 
y^.  Since  regression  deals  with  the  linear  dependence  between  these 
two  elements  it  is  quite  natural  to  summarize  our  data  z^(...,za  at 
a  first  stage  by  a  covariance  matrix  and  then  extract  the  specific 
information  we  are  primarely  interested  in  —  about  $  and  a  —  at 
the  second  stage. 

Remark:  To  implement  the  above  program  it  seems  advantageous  to 
take  out  the  constant  term  in  our  regression  model.  This  term 
corresponds  to  a  carrier  identically  equal  to  1  and  if  we  allow  for 
it.  we  will  be  in  the  rather  special  position  that  all  vectors  Zj 
lie  in  a  p-dimensional  subspace  of  Rp+1.  The  natural  way  to  proceed 
is  to  estimate  the  constant  term  via  simultaneous  location 
estimation.  This  way  our  target  will  be  the  value  of  the  regression 
function  at  a  central  carrier  value.  The  formulas  we  will  present 
are  all  written  for  the  case  where  we  do  not  estimate  location. 
These  formulas  can,  however,  easily  be  modified. 

Let  us  define  a  robust  variance-covariance  matrix  W*  through 
the  implicit  equation 

*  5  *-l  • 

—  I  utz^W  z ^ )  z4  Zj'  *  f  (2.2) 

n  i*l 

where  u()  is  a  weight  function  such  that  f*  is  Fisher  consistent  at 
a  multivariate  Gaussian  model.  Such  N-estimators  of  covariance 


■•trices  are  discussed  and  examined  in  Maronna  (1976)  (see  also 


Hnber  (1981)).  Now  we  partition  W* 


f  - 


where  V*  is  the  pxp  covariance  matrix  of  the  carriers,  c*  is  the 
pxl  vector  of  carrier-response  cross  terms  and  e*  is  the  total 
variance  of  the  response.  In  analogy  to  the  OLS-case  we  finally  put 


and 


estimated  parameter  b  *  V*-1  c* 

9  *  $ 

estimated  error  variance  s  *  e  -  b'V  b 


(2.3) 


If  we  now  look  back  we  realize  that  the  system  of  equations  (2.2) 
is  equivalent  to 


1  n 

-  I 

n  i*l 
1  n 

-  i 

n  i-1 
1  n 

-  I 

n  i-1 


ri  Xi  ri  "  ° 


2  2 
'i  ri  *  * 


*i  Xi  Xi 


V  , 


where 


(2.4) 


the  weights  are  w^  -  uiz^W*-1!^)  -  u(x£,V*-1xi  +  r^/s^)  and  r^ 
“  *i  “  xi'b- 

proof :  In  order  to  get  (2.4)  from  (2.2)  we  have  to  partition 
according  to  the  blocks  of  I*.  E.g. 


(2.4). 


—  I  wi  xi  ^xi  **)»  yields  the  first  equation  in 

n  i*l 

Remarks: 

(1)  From  (2.4)  we  immediately  infer  that  in  the  special  case  where 
u()  -  1  we  are  almost  back  at  the  nsual  OLS  estimates  for  JJ 

•y 

and  ct  . 

(2)  It  would  be  of  interest  for  the  sake  of  high  breakdown  (see 
Huber  (1981))  to  estimate  the  covariance  matrix  via  a  high 
breakdown  estimator  (Donoho  and  Huber  (1983)  and  Stahel (1981) ) .  At 
the  moment  this  does  not  seem  to  be  practical,  but  we  nevertheless 
want  to  point  out  this  important  possibility.  We  plan  to  report  our 
results  on  an  M-estimator  approach  as  in  (2.2),  hoping  that  they 
will  be  helpful. 

The  solution  W*  in  (2.2),  i.e.  the  estimated  covariance  matrix, 
satisfies  a  very  broad  equivariance  relation.  If  we  transform  the 
data  linearly  Zj  *  Az ^  i~l,...,n  the  covariance  matrix  will  be 
transformed  as 

W*  (l1,...,za)  -  A  W*(z1 . za)  A'  (2.5) 

(affine  equivariance) . 

The  parameters  of  interest  in  (2.1)  are  f)  and  o.  The 
equivariance  relations  we  might  wish  to  impose  on  an  estimator  of 
these  parameters  are: 
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(i)  *  n<7i  +  i*l,...,n - > 

*  Tj(b(y1,...,yn)  +  y)  and 
s (?!•.... jra)  -  n  s(yi....,ya) 

(regression  eqnivariance) 

and  (2.6) 

(ii)  y.  *  yAi  =  M  Xj  i*l,...,n - 4 

b(yj . ya>  =  M*'1  b(y1.....ya)  and 

?n>  *  s(y1.....ya) 

(carrier  eqnivariance) . 

Regression  eqnivariance  means  that  an  estimator  behaves 
reasonably  if  we  add  an  exactly  linear  function  of  the  carriers  to 
the  response.  The  carriers  are  viewed  as  fixed  and  onr  answer  b  is 
always  defined  in  relation  to  them.  Carrier  eqnivariance  is  quite  a 
different  concept.  If  the  parametrization  of  onr  model  (2.1)  is 
changed  in  a  linear  fashion,  it  then  makes  sense  to  nse  the  inverse 
transformation  on  the  parameters. 

The  importance  of  (ii)  lies  in  the  fact  that  by  transforming 
onr  parametrization  we  can  sometimes  get  parameters  which  we  can 
estimate  better,  i.e.  with  decreased  variability.  Carrier 
eqnivariance  lets  ns  move  between  these  different  parametrizations 


without  inconsistencies. 
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It  is  easily  sees  that  affine  equivariance  (2.S)  of  our 
covariance  estimator  is  a  sufficient  condition  to  ensure  (2.6). 
Proof: 


Put  A 


I  0  M  0 

,  or  A  =  in  (2 . 

.  nr  n  L  0  1  . 


Remark:  The  robust  estimators  described  at  the  beginning  are  most 
easily  described  as  solutions  to  a  weighted  least  squares  scheme 
where  the  weights  have  to  be  found  iteratively.  It  is  easily  seen 
that  regression  equivariance  and  carrier  equivariance  are  obtained 
is  the  case  where  the  final  weights  are  invariant  under  the 
transformations  given  in  (2.6).  Quantities  which  fulfill  this 
invariance  requirement  are  standardized  residuals  r^/s  and  squared 
standardized  distances  in  the  carrier  space  (x^’V*-1*^ ,  where  s  is 
a  scale  equivariant  function  of  the  residuals  and  V*  is  an  affine 
equivariant  function  of  the  carriers.  In  the  case  of  the  classical 
Huber  estimator  the  final  weights  only  depends  on  r^/s.  This 
estimator  —  fully  iterated  —  is  therefore  equivariant.  If  we  use 
a  few-step  version,  i.e.  we  do  not  fully  iterate  but  rather  use  a 
pre-fixed  number  of  steps  in  some  iteration  scheme,  the  resulting 
estimator  will  not  be  equivariant  but  hopefully  nearly  so. 

The  Kraser-Velsch  estimator  is  equivariant  as  well.  Its  weights 
depend  on  the  product  of  standardized  residuals  times  standardized 
carrier  distance.  The  Mallows  estimator  finally  will  be  equivariant 
if  we  are  careful  in  the  choice  of  carrier  weights  —  they  should 


only  depend  on  the  standardized  carrier  distances. 


2.2.  ASYMPTOTIC  PROPERTIES  OF  THE  REGRESSION  ESTIMATOR  (2.3) 


In  this  subsection  we  want  to  discuss  the  asymptotic  properties 
of  a  regression  estimator  defined  through  a  covariance  M-estimator 
(2.2).  Ve  saw  in  (2.4)  that  it  corresponds  to  a  weighted  least 
squares  estimator  whose  weights  depend  on  the  sum  of  the  squared 
standardized  residual  and  the  squared  standardized  carrier 
distance.  Maronna  (1975)  develops  the  asymptotic  theory  for 
covariance  M-estimators.  Ve  plan  to  use  these  results  and  to  look 
what  happens  if  we  apply  (2.3).  But  first  let  us  examine  the 
influence  function  (see  Hampel  (1974)). 

In  order  to  simplify  the  formolas  we  will  first  consider  the 
case  where  the  carrier  distribution  GO  (see  beginning  of  Section 
2)  is  symmetric  with  respect  to  each  coordinate,  i.e.  the  2P 
vectors  (+Xj, . . .,+Xp)  for  all  possible  choices  of  signs  are 
identically  distributed  with  distribution  GO.  If  in  addition  the 
error  distribution  H(x/o)  is  symmetric  the  influence  on  the  jth 
component  bj  (j  »  l,...,p)  in  (2.3)  is 


,G,H 


(xo'yo) 


(2.7) 


where  d,^  *  (x,',y,)  W_1(G,H)  (x,',y,)’  and  r#  -  y,  -  ft'(G,H)xa, 
and  Sj  «  Vjj  +  Egg  [u'(d^)xj2r^  2/o^]  (note  that  Xj  refers  to  the 
jth  component  of  x). 


In  order  to  understand  this  formula  we  have  to  remember  that 
the  influence  function  is  an  asymptotic  “tool"  and  that  therefore 
the  population  values  of  our  estimators  appear  in  the  foranla. 
W(G,H)  is  defined  through 

V(G,H)  »  Efi  B  fu(d2)  (x  ,y)  (x  ,y)]  (d?  *  (x  ,y)  f  *(x  »y)  ) 

and  can  be  partitioned  just  as  W*  (see  (2.3)).  Vjj  denotes  the  jth 
diagonal  element  of  W(G.H). 

Remarks : 

(1)  If  we  do  not  impose  the  syametry  requirements  on  the  carrier 
distribution  GO  the  influence  on  the  vector  b  in  (2.3)  is 

IFb  G  H  *xe'ye*  *  a*d»  *  ro  A  lx0* 
where  Ajk  =  ♦  EqH  (u’(d2)  2r2/o2  XjX^l. 

Proof :  Our  estimator  can  be  put  into  the  usual  M-e$timator  form, 
i.e.  we  estimate  the  parameter  $  =  (0.  a,  V)  based  on  the  data 
z1,...,zn  via  £  p(zi#5)  =  0.  Now  we  can  apply  the  standard  formulas 
to  get  influence  function  (see  Buber  (1981)).  Our  ^-function  splits 
into  three  parts  according  to 
^(z.g)  =  u(d2)  x  r 

5)  *  u(d2)  r2  -  a2 
and  q»2(z.{)  ■  u(d2)  xx'  -  V. 

The  matrix  D  >  Eq  g  [6p/&(]  has  the  block  form 


where  A  is  the  matrix  which  turn*  up  is  the  influence  for  b.  The 
above  block  form  is  ensured  if  only  the  error  distribution  H(x/a) 
is  symmetric  around  0. 

(2)  The  influence  function  for  a  general  class  of  robust  regression 
estimators  can  be  found  in  Maronna,  Bustos  and  Tohai  (1979). 

(3)  The  influence  function  gives  a  description  of  the  bias 
introduced  by  infinitesimal  perturbations  of  an  ideal  model.  It 
turns  out  that  a  bounded  iatluea ce  is  desirable  (see  Hampel 
(1971)).  Under  the  strong  symmetry  conditions  we  have 

o 

sup  { |lF  (Op  * -  const(G.H)  sup  (cu(c)). 

z D,u,“  2  c 

Proof ;  see  (2.7)  and  note  that  ff(G,H)  »  diag(wj, . . . ,wp.o) . 

Our  estimator  ((2.2)  and  (2.3))  has  therefore  a  bounded  influence 
function  provided  the  weight  function  u(c)  goes  to  zero  fast  enough 
as  c  gets  large.  This  is  not  surprising  since  the  weight  depends  on 
both  the  size  of  the  residual  as  well  as  the  distance  form  the 
center  in  carrier  space. 

The  asymptotic  normality  of  our  estimates  can  be  proved  using 
a  standard  result  due  to  Huber  (1967).  The  proof  follows  the  same 


steps  ss  Maronna  (1976).  Unfortunately  asymptotic  normality  is  only 
proved  under  the  extremely  stringent  condition  that  the  joint 
distribution  of  x  and  y  is  spherically  symmetric  np  to  an  affine 
transformation.  This  is  an  assumption  which  can  possibly  be 
relaxed,  but  another  proof  is  required.  The  other  regularity 
conditions  needed  are  boundedness  and  monotonicity  conditions  on 
u(c)  and  cu(c),  respectively.  Under  all  of  these  assumptions  it  is 
true  that  the  estimate  bj  (see  (2.3))  is  asymptotically  Gaussian 
with  mean  jij  and  asymptotic  variance 


_  r  2.. 2.  2  2, 

EG.H  [a  (d  >  r  Xj  1 

2 

a . 

J 


(see  (2.7)), 


the  components  of  b  furthermore  are  independent. 

Maronna  (1976)  showed  that  the  breakdown  properties  of 
covariance  estimates  of  the  form  (2.2)  are  not  enconraging.  As  the 
dimensionality  of  the  z-space  increases,  i.e.  if  we  add  more 
carriers,  the  breakdown  point  of  V*  (see  (2.2))  is  necessarily 
decreasing  like  1/ (1+dimens ional ity) .  The  regression  estimator 
(2.3)  which  is  based  on  W*  has  the  same  breakdown  point.  If  we  add 
a  contamination  along  an  arbitray  regression  line  and  let  the 
contaminating  carrier  value  go  to  infinity  the  estimator  breaks 
down. 


Ve  have  discussed  above  two  types  of  properties.  First  the 
behavior  at  the  presnmed  central  model  which  is  described  by  the 
asymptotic  parameters.  And  second  the  breakdown  point  which  is  a 
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simple  indicator  for  the  behavior  onder  severe  deviations  from  the 
central  aodel.  The  asymptotic  influence  cnrve  stands  somewhat 
inbetween.  On  one  hand  it  provides  a  decompositon  of  the  asymptotic 
variance  at  the  central  model,  on  the  other  hand  it  is  nsefnl  as  an 
indicator  for  the  behavior  under  perturbations  of  the  linearity 
assumption. 

In  the  next  section  we  will  use  the  formulas  derived  above  to 
give  us  numerical  values.  But  these  asymptotic  findings  are  of 
course  of  secondary  importance.  Vhat  we  really  need  to  know  is  the 
small  sample  behavior.  To  see  how  well  asymptotics  predicts  the 
corresponding  small  sample  values  some  simulation  results  for 
samples  of  size  20  are  included. 

3.  COMPARISON  OF  REGRESSION  ESTIMATORS 

In  order  to  compare  the  various  estimators  discussed  in 
Section  2  we  plan  to  do  a  small  Monte  Carlo  study.  To  further 
simplify  the  questions  involved  we  will  restrict  attention  to  the 
simple  case  p*l.  i.e.  to  the  model 

yt  “  pxj  ♦  i«*l,...,n  (3.1) 

with  random  carriers  (independent  of  the  errors).  For  model  (3.1) 
the  questions  of  how  to  handle  a  constant  term  does  not  appear  and 
(2.2)  together  with  (2.3)  defines  a  reasonable  estimator  of  0.  To 
fully  specify  this  estimator  we  need  to  define  a  weight  function 
u(c).  A  somewhat  typical  choice  is 


J  V  J"  v- 


c(l-exp(-k/2)) 


where  (c )  «  max(-k,min(c,k) )  denotes  Huber's  p-function.  The 

jnstif icstion  for  this  psricnlsr  fora  of  a  weight  function  lies  in 
the  fact  that  near  the  Gaussian  error  aodel  a  good  behavior  of  the 
asymptotic  efficiency  can  be  expected.  The  weight  function  is 
chosen  in  such  a  way  that  the  resulting  estimator  is  Fisher 
consistent  at  the  Gaussian  model. 

We  still  have  the  constant  k  at  our  disposal.  It  determines 
the  trade-off  between  resistance  and  efficiency.  For  k-5.77  the 
asymptotic  variance  of  our  estimator  of  0  at  the  standard  Gaussian 
model  (i.e.  both  the  error  distribution  and  the  carrier 
distribution  are  standard  Gaussian)  is  1.0S  —  or  the  efficiency  is 
95%.  This  appears  to  be  a  reasonable  normalization. 

3.1  ASYMPTOTIC  NUMBERS 

The  sampling  situation  we  take  into  consideration  are 
polyGaussians  of  the  form 

(1-e)  N(0.1)  +  s  N(0,t2)  »:  (s/t2)  (3.3) 

and  will  be  used  both  as  a  model  for  errors  as  well  as  carriers. 

Table  3.1  contains  the  values  of  the  asymptotic  variance  for 
the  regression  estimator  under  consideration. 
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Table  I:  asymptotic  vsrisace  of  b  (see  (2.3)).  (e^/tj  )  refers  to 
the  carrier  distribution,  (*2^*2^  denotes  the  error  distribution 
(see  (3.3)) 


*2lx  1 


•l^l 

N(0,1) 

0.1/9 

0.1/25 

0.05/100 

N(0.1) 

1.05* 

1.33 

1.39 

1.16 

0.1/9 

0.71 

0.90 

0.96 

0.79 

0.1/23 

0.55 

0.69 

0.74 

0.61 

0.05/100 

0.77 

0.97 

1.03 

0.86 

(These  values  were  computed  numerically  with  a  24-point  Gaussian 
quadrature  procedure) 

*:  normalization  as  discussed  above 

Since  the  covariance  estimator  (2.3)  has  bounded  influence  if 
used  with  the  weight  function  (3.2)  its  asymptotic  behavior  depends 
on  the  distribution  of  the  carriers.  Table  I  shows  that  this 
dependence  is  quite  remarkable.  The  first  column  combines  standard 
Gaussian  errors  with  increasingly  heavy  tailed  carrier 
distributions.  The  least  squares  estimators  are  of  course  optimal 
in  these  situations  with  asymptotic  vsrisnces  of  1.00,  0.S6,  0.29, 
and  0.17  .  The  asymptotic  efficiency  of  oar  estimator  in  these 
situations  is  therefore  93%,  79%,  33%,  and  22%.  It  appears  that  the 
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normalization  of  bounded  influence  regression  estimators  is  a  non¬ 
trivial  setter  if  we  want  to  achieve  a  fair  comparison  with 
estimators  which  do  not  treat  outliers  in  carrier  space  any 
differently.  Most  of  the  information  about  the  slope  parameter  0 
(see  (3.1))  in  the  first  column  of  Table  3.1  lies  exactly  in  the 
points  far  out  in  carrier  space.  Disregarding  this  information 
results  in  a  big  loss. 

Maronna,  Bustos  and  Tohai  (1979)  give  tables  similar  to  Table 
I  for  a  Mallows-type  estimator  and  the  Hampel -Iras her  estimator. 
Table  II  shows  the  numbers  for  the  Mallows  estimator  —  the  Hampel- 
Krasker  estimator  behaves  similarly,  but  a  bit  worse  overall 
because  of  our  normalization  (the  Hampel-Crasker  estimator  is  the 
'most  robust*  if  we  normalize  this  wayl). 

Table  II:  asymptotic  variance  of  Mallows  estimator  with  Huber 

y 

weight  function,  (sj/tj  )  refers  to  the  carrier  distribution, 

y 

(tj/^2  '  denotes  the  error  distribution  (see  (3.3)) 


•l/xl2 

N(0.1) 

t2/x22 

0.1/9  0.1/25 

0,05/100 

N(0,1) 

1.05 

1.42  1.51 

1.42 

0.1/9 

0.66 

0.86  0.98 

0.85 

0.1/25 

0.46 

0.59  0.44 

0.59 

0.66 


0.3  9 


0.05/100  |  0.35 


0.45 


0.51 


0.45 


Table  III  contains  the  efficiencies  of  the  Mallows  estimator  with 
respect  to  the  covariance  estimator  (2.3). 

Table  III:  (asymptotic  variance  of  covariance  estimator)/ 
(asymptotic  variance  of  Mallows  estimator),  (s^/r^  )  refers  to  the 
carrier  distribntion.  (e2/t2^)  denotes  the  error  distribution  (see 
(3.3)) 

\ 

a2/t22 


«1/T12 

N(0.1) 

0.1/9 

0.1/25 

0.05/100 

N(0.1) 

1.00 

1.00 

0.97 

0.92 

0.1/9 

1.05 

1.07 

1.05 

0.99 

0.1/25 

1.17 

1.15 

1.15 

1.07 

.05/100 

2.03 

2.02 

1.98 

1.89 

As  the  carrier  distribution  gets  heavier  tails  the  covariance 
estimator  is  outperformed  by  the  Mallows  estimator.  But  in  most 
cases  the  relative  loss  compared  to  the  Mallows  estimator  is  not 
serious. 

It  should  be  noted  that  the  robust  estimator  based  on 
covariance  is  not  intended  as  a  replacement  for  other  bounded 
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influence  fitters.  Its  weight  function  (see  (2.4))  h«s  the 
appealing  feature  that  the  response  variable  and  the  carriers  are 
interchangable.  All  possible  choices  of  "the  response”  can  be 
easily  fitted.  In  some  situations  this  can  be  helpful. 

Table  IV  finally  shows  the  gross-error  sensitivities  of  the 
different  estimators  together  with  their  asymptotic  breakdown 
points.  The  gross-error  sensitivity  is  defined  as  the  maximum  of 
the  (Euclidian  norm)  of  the  influence  curve  (see  Hampel  (1974))  and 
therefore  serves  as  a  description  of  the  behavior  under  deviations 
from  the  central  model.  In  a  sense  we  could  call  these  our  tools  of 
measuring  the  resistance  of  the  estimators. 

Table  IV:  gross-error  sensitivity  and  breakdown  point  under 
standard  Gaussian  error  distribution,  (*i*Ti  )  refers  to  the 
carrier  distribution  (see  (3.3)) 


est imators 

Covariance  Hallows  Hampel-Krasker 
breakdown  points 


1 

0.16 

0.31 

0.50 

■•V  ’  •' 

«1/T12  1 

gross 

error  sensitivities 

N(0.1)  | 

3.34 

3.40 

2.94 

Sy  v 

0.1/9  | 

2.86 

2.66 

2.22 

0.1/23  | 

2.63 

2.15 

1.73 

L— 

V  *« 


0.05/100  | 


2.99 


2.07 


1.64 


(for  the  lest  two  columns  see  Maronna,  Bustos  and  Yohai  (1979), 
Table  4) 

The  asymptotic  optimality  property  of  the  Hampel-Erasker 
estimator  is  clearly  visible  in  Table  IV  —  that  particular 
estimator  minimizes  the  gross-error  sensitivity  under  a  fixed 
asymptotic  variance  (all  of  this  at  the  central  model).  This  is 
exactly  the  reason  why  we  are  interested  in  using  it. 

3.2  RESULTS  OF  A  SIJIULATION  EXPERIMENT 

Table  V  contains  empirical  variances  of  the  regression 
estimators  (2.3)  for  samples  of  size  20.  The  weight  function  u()  is 
described  at  the  beginning  of  Section  3  and  the  model  is  described 
in  (3.3). 


1  /I 

Table  V:  empirical  variances  of  a  '  b  (see  (2.3))  for  samples  of 

i  2 

size  n*20,  (tj/tj  )  refers  to  the  carrier  distribution,  (tj/^  ) 

denotes  the  error  distribution  (see  (3.3)) 


•l/rl 


N(0, 1 )  0.1/25  0.05/100 


N(0,1)  j 

1.17 

1.88 

1.61 

0.1/25  | 

0.69 

1.16 

0.94 

0.05/100  | 

1.14 

1.34 

1.14 

0.1/100  | 

0.57 

0.94 

— 

(The  numbers  in  this  ttble  are  based  on  500  replicas  and  the 
estimated  relative  errors  are  between  9%  and  15%) 

A  comparison  with  Table  I  shows  that  the  asymptotic  variance 
is  in  all  situations  lower  than  the  actual  small  sample  variance. 
But  otherwise  the  pattern  in  these  estimated  variances  follows  the 
behavior  of  Table  I.  We  do  not  have  the  corresponding  numbers  for 
the  Mallows  estimator.  A  look  at  Table  5  in  Maronna,  Bustos  and 
Tohai  (1979)  shows,  however,  that  the  step  from  asymptotic 
variances  to  small  sample  values  very  closely  follows  the  behavior 
we  have  seen  in  our  numbers.  It  therefore  appears  that  the 
asymptotic  theory  is  a  good  enough  approximation  to  the  small 
sample  (n*20)  behavior. 

As  pointed  out  in  Maronna,  Bustos  and  Tohai  (1979)  the 
distribution  of  the  estimated  regression  coefficient  is  apparently 
non-Gaussian  in  a  lot  of  sampling  situations.  This  makes  the 
comparison  between  different  estimators  more  difficult  since  the 
variance  —  or  mean-square-error  —  might  not  be  the  criterion  to 


use.  From  our  simulation  studies  it  appears  that  the  distribution 
of  the  slope  estimator  is  heavier  tailed  that  a  Gaussian.  This  is 
true  in  the  ease  of  the  covariance  estimator  for  either  a 
'contaminated*  carrier  distribution  or  a  heavier  tailed  error 
distribution. 

We  would  finally  like  to  point  to  a  future  research  topic. 
Donoho  and  Huber  have  recently  pointed  out  the  intuitive  appeal  of 
the  small  sample  breakdown  point  (see  Donoho  and  Huber  (1983)  and 
Huber  (1984))  and  it  would  be  useful  to  study  this  aspect  of  our 
estimators.  In  the  case  of  regression  these  studies  will  not  be 
simple,  however,  since  the  breakdown  point  will  presumably  depend 
on  the  actual  carrier  values. 

4.  CONCLUSIONS  AND  AN  EXAMPLE 

We  have  discussed  an  approach  to  robust  regression  estimation 
based  on  robust  covariance  estimation.  We  have  seen  in  Section  2 
that  if  we  adopt  an  affine  equivariant  M-estimator  for  the 
covariance  side,  we  do  nothing  else  but  a  weighted  least  squares 
approach  on  the  regression  side.  The  weights  depend  on  two 
quantities: 

(1)  standardized  residual 

(2)  standardized  norm  in  the  carrier  space. 

Both  of  these  are  estimated  robustly  and  the  sum  of  their  squares 
is  the  quantity  that  matters.  This  particular  form  of  the  weight 


function  is  rather  unique  and  symmetric  with  respect  to  the  notions 
'response*  and  'explanatory*.  This  symmetry  makes  the  choice 
attractive.  The  covariance  estimator  fits  a  linear  model  to  the 
bulk  of  the  data.  Points  far  out  in  the  space  of  explanatory 
variables  are  identified  as  not  belonging  to  the  majority  of  the 
data  just  as  are  points  which  do  not  fit  the  linear  structure  well. 
This  is  the  primary  difference  to  other  bounded  influence  methods 
which  try  to  identify  only  the  second  class  of  outliers. 

Our  particular  estimator  seems  to  downweights  points  far  out 
in  the  carrier  space  too  much.  This  can  possibly  be  corrected  by 
using  a  different  weight  function. 

Let  us  finally  look  at  an  example.  Ve  choose  the  stack  loss 
data  given  in  Daniel  and  Wood  (1982,  p.  61)  involving  three 
independent  variables 

Xj  =  air  flow 

X2  =  cooling  water  temperature 
and  xj  *=  acid  concentration. 

The  response  variable  is  the  stack  loss.  The  sample  size  n  is  21. 
For  our  covariance  estimator  we  use  the  weight  function  u(x)  = 
9^(x)/x  (xlO),  where  ^(x)  3  min(k,  max(-k,x))  is  Huber's  f- 
function.  The  tuning  constant  k  is  chosen  as  5.0. 

On  that  particular  data  set  the  weights  determined  by  the 
covariance  estimator  after  20  iterations  turn  out  to  be  small  for 
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the  first  foar  and  the  last  point.  Figure  I  shows  a  plot  of  the 
residuals  of  this  fit  against  the  first  variable  ij .  The  weight  is 
indicated  by  the  area  of  the  point.  We  can  identify  the  S  outlying 
points  and  we  can  also  appreciate  the  coaunent  made  by  Daniel  8c  Wood 
that  one  of  these  points  fits  the  plane  determined  by  the  rest 
qnite  well. 

* 

The  final  conclnsion  of  Daniel  and  Wood  is  that  (°r  xi 
should  be  included  and  that  zj  can  be  dropped.  Our  analysis 
supports  those  findings  and  in  fact  recovers  the  final  fit  given  by 
Daniel  &  Wood. 
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